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Abstract 

In this paper we present a method for localisation of fa¬ 
cial landmarks on human and sheep. We introduce a new 
feature extraction scheme called triplet-interpolated fea¬ 
ture used at each iteration of the cascaded shape regres¬ 
sion framework. It is able to extract features from simi¬ 
lar semantic location given an estimated shape, even when 
head pose variations are large and the facial landmarks are 
very sparsely distributed. Furthermore, we study the im¬ 
pact of training data imbalance on model performance and 
propose a training sample augmentation scheme that pro¬ 
duces more initialisations for training samples from the mi¬ 
nority. More specifically, the augmentation number for a 
training sample is made to be negatively correlated to the 
value of the fitted probability density function at the sam¬ 
ple’s position. We evaluate the proposed scheme on both 
human and sheep facial landmarks localisation. On the 
benchmark 300w human face dataset, we demonstrate the 
benefits of our proposed methods and show very competi¬ 
tive performance when comparing to other methods. On a 
newly created sheep face dataset, we get very good perfor¬ 
mance despite the fact that we only have a limited number 
of training samples and a set of sparse landmarks are an¬ 
notated. 


1. Introduction 

Many computer vision applications require localisation 
of a set of landmarks for the purpose of fine-grained recog¬ 
nition. For example, joint localisation in human pose esti¬ 
mation ED , part localisation for bird O and dog tlSj breed 
recognition. It is of interest to localise facial landmarks for 
animals and humans, given the fact that their faces hold rich 
information such as identity, expression, health conditions, 
etc. In this paper, we are interested in localising sheep and 
human facial landmarks for real applications. Sheep facial 
landmark localisation is new in computer vision field and 
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Figure 1: Normal sheep (left) vs. sheep in pain (right). The 
red landmarks are associated with distinguishable patterns 
that we intend to localise. 


has very promising potential in animal welfare. Compared 
to other animals, sheep have less intricate facial muscles 
and thus do not appear to have a wide array of facial ex¬ 
pressions. However, researchers have linked a few specific 
postures with emotional experiences, for example backward 
ear posture, which is associated with unfamiliar and uncon¬ 
trollable unpleasant situations, could express fear (71. Iden¬ 
tifying the pain or suffering of animal (like sheep) is an es¬ 
sential aspect of animal welfare and is very helpful to both 
researchers and farmers. As an example shown in Fig[^ the 
sheep on the right is suffering heavily from pain while the 
sheep on the left is in a normal condition. Experts on animal 
welfare research are able to pick up several distinguishable 
patterns of sheep-in-pain such as orbital tightening, abnor¬ 
mal ear position and abnormal nostril and philtrum shape. 
In order to identify those features automatically, localising 
the corresponding landmarks on sheep face is very essential, 
which is conceptually very similar to human facial land¬ 
marks localisation (also called face alignment). As a classi¬ 
cal problem in computer vision, face alignment has been in¬ 
tensively studied in the past decades due to its wide applica¬ 
tions for example face recognition, facial expression recog¬ 
nition, avatar animation, etc. Several recent methods such 
as (21 in [22l |30| [40l |38l HU have reported close-to-human 
performance on the academic databases such as LEW (211 . 
LFPW O and HELEN (24l. 



However, we meet several obstacles when we apply the 
state of the art algorithms directly to real data, for both hu¬ 
man and sheep facial landmark localisation. First, unlike 
the benchmark dataset for human face alignment, in which 
a large number of landmarks are often annotated, the num¬ 
ber of facial landmarks in practice is usually smaller, due to 
the annotation cost and fewer landmarks of interest. Sec¬ 
ond, both human face and sheep face show big head pose 
variations in real world given the uncontrollability. It usu¬ 
ally results in localisation failures. 

In this paper we deal with the problems mentioned 
above. We build our localisation algorithm on top of 
the Cascaded Pose Regression (CPR) framework, given its 
good performance in facial landmarks localisation in the 
wild. There has been a series of works with incremen¬ 
tal improvement one after the other including Eiiaiisi. 
The most recent work RCPR (lO) introduced interpolated 
shape-indexed features used in each regression. It demon¬ 
strated better robustness against large pose variations and 
shape deformations, compared to the closest landmark in¬ 
dexed feature in 0. However, the two-point-interpolation 
method limits the feature extraction space, especially when 
the number of facial landmarks is small. Landmark spar¬ 
sity is often the case when we need to annotate a new train¬ 
ing dataset given a limited amount of time or only a small 
number of landmarks needed. To overcome those issues we 
make the following contributions. 

• We propose a new feature extraction scheme, called 
triplet-interpolation feature (TIF) for cascaded pose re¬ 
gression. It uses three anchor landmarks to calculate a 
shape-indexed feature. It is more robust to large head 
pose variation and shape deformation. More impor¬ 
tantly, with this scheme, features can be extracted from 
the facial area with no restriction. 

• We propose an augmentation scheme for training sam¬ 
ple to deal with the issue of imbalanced training data 
distribution. This scheme sets the augmentation num¬ 
ber of each training sample to be negatively correlated 
to its value in the probability density function of the 
training data. More intuitively, we augment the minor¬ 
ity training samples with more random initialisations 
and vice versa. 

We have carried out experiments on both human and 
sheep facial landmarks localisation and demonstrate the 
benefits of our proposed methods under the situation of 
sparse landmarks and large head pose variations. It also 
shows competitive overall performance comparing to other 
related methods. 

The reminder of the paper is organized as follows. In 
section we present related work. Then we introduce the 
triplet-interpolation features and the augmentation scheme 


in sectionIn section]^ we evaluate our proposed methods 
on both human and sheep facial landmarks localisation and 
in section [ 5 ] we draw some useful conclusions. 

2. Related work 

2.1. Facial landmarks localisation 

Facial landmarks localisation has made considerable 
progress in recent years and a large number of methods have 
been proposed. Two types of source information are usu¬ 
ally used: facial appearance and shape information. Based 
on whether a method has an explicit detection model for an 
individual landmark or not, we categorise them into local- 
based methods and holistic-based methods. The methods in 
the former category usually rely on explicit discriminative 
local detection and usually use deformable shape models to 
regularise the local outputs while the methods in the latter 
category directly regress the shape (the representation of the 
facial landmark locations) in a holistic way. 

Local based methods usually consist of two parts: local 
experts and spatial shape models. The former describes how 
image around each facial landmark looks like in terms of 
local intensity or colour patterns while the latter describes 
how face shape varies. There are three main types of local 
feature detection. (1) Classification methods include Sup¬ 
port Vector Machine (SVM) classifier Gsiia based on var¬ 
ious image features such as Gabor ||37l, SIFT (271, Dis¬ 
criminative Response Map Fitting (DRMF) by dictionary 
learning in and multichannel correlation filter responses 
Ga. (2) Regression-based approaches include Support 
Vector Regressors (SVRs)(28l with a probabilistic MRF- 
based shape model. Continuous Conditional Neural Fields 
(CCNF)(1. (3) Voting-based approaches are also intro¬ 
duced in recent years, including regression forests based 
voting methods CaiHED and exemplar based voting 
methods O [32l. One typical shape model is the Con¬ 
strained Local Model (CLM) (Tsl . There are some other 
shape models such as RANSAC in 0, graph-matching in 
l44l . Gaussian Newton Deformable Part Model (GNDPM) 
(^ and mixture of trees (461 . 

Holistic methods have gained higher popularity in re¬ 
cent years. Most of them work in a cascaded way simi¬ 
lar to the classical Active Appearance Model (AAM) m. 
We list very recent holistic methods as well as their proper¬ 
ties in Table [T] These methods work in a similar cascaded 
framework but differ from each other mainly in three as¬ 
pects. First, how to set up the initialisations; Second, how 
to calculate the shape-indexed features; Third, what type 
of regressor is applied at each iteration. Feature extraction 
and regression are usually interdependent. As can be seen, 
several methods have investigated using simple pixel dif¬ 
ference (diff.) features that is calculated from the current 
shape. Random ferns and random trees are widely used for 



Table 1: Holistic methods and their properties. 


Methods RCPR lEl ESR 0 LBF (30l TREES GS SDM (38) TCDCN ES 

features pixel diff. pixel diff. forest on pixels pixel SIFT ConvNet feature 

regressor random ferns random ferns linear random trees linear ConvNet 


regression. Using raw pixel difference feature makes the 
algorithm very efficient. In our testing, the method ESR, 
RCPR, LBF and TREES with C++ implementation process 
a standard face image in mini-seconds on an i7 desktop with 
a single core. This is a great advantage in systems that are 
designed to process a large number of faces, for example 
to analyse a group of sheep at the same time. SDM has 
been widely applied given its good performance of the pub¬ 
licly available model. It runs at around 30 frames per sec¬ 
ond. TCDCN has applied deep learning approach for face 
alignment by multi-task learning, but training such a model 
usually requires a big dataset with multiple additional anno¬ 
tations such as head pose, w/o glasses, etc. 

There are several other approaches for holistic face 
alignment such as occlusion detection based methods by 
|[T9l[40l, combined local and holistic method in ID, SDM 
variants including the global SDM and shape searching 
in na. Due to their different setting and limited space, we 
will not compare them in our experiments. 

2.2. Data imbalance 

The data imbalance problem is of particular importance 
in real world scenarios as the available data usually follows 
a long tail distribution. Data imbalance has been widely 
studied in classification problems, i.e., a few classes are 
abundant while others only have a limited number of sam¬ 
ples 1^ . State of the art solutions include sampling meth¬ 
ods (e.g. under-sampling ll26ll and SMOTE over-sampling 
ifTOl ). cost-sensitive learning (TTlIJOl. On the contrary, very 
little attention has been paid on data imbalance in regres¬ 
sion problem (like our facial landmark localisation). This is 
mainly due to the fact that the data imbalance is difficult to 
be noticed given the continuity and the usually high dimen¬ 
sionality of the output space. Thus in this paper we investi¬ 
gate how to adapt the approach of tackling class imbalance 
to regression problem. 

3. Method 

In this section, we first briefiy review the general cas¬ 
caded pose regression (CPR) approach, on which our lo¬ 
calisation algorithm has been built. Then we introduce the 
triplet-interpolated features. Eollowing that, inverse propor¬ 
tional augmentation is discussed in details as an approach to 
deal with imbalanced training data. 


3.1. General CPR and RCPR 

The shape of a human or sheep face is repre¬ 
sented as a vector of landmark locations, i.e., S = 
(y 1 ,..., y/e,..., yk ) ^ , where K is the number of land¬ 

marks. Yk G is the 2D coordinates of the k-th landmark. 
CPR is formed by a cascade of T regressors, Shape 

estimation starts from an initial shape and progressively 
refines the pose. Each regressor refines the pose by produc¬ 
ing an update, AS', which is added up to the current shape 
estimate, that is, 

s* = + A5. (1) 

The update AS is returned by the regressor that takes the 
previous pose estimation and the image feature / as inputs: 

= ( 2 ) 

The CPR is summarized in Algorithm ifl^ .This CPR 
framework differs from the classic boosted approaches 
mainly in the feature re-sampling process. More specifi¬ 
cally, instead of using the fixed features, the input feature 
for regressor is calculated relative to the current pose es¬ 
timation, thus in turn introduces geometric invariance into 
the cascade process and shows good performance in prac¬ 
tice. This is often referred as pose-indexed features as in 
ca. The idea of sampling features from current pose es¬ 
timation is later used in EEa. To strengthen the geo¬ 
metric invariance, instead of extracting features from the 
closest landmarks, RCPR m utilizes a different feature¬ 
indexing method namely the interpolated 

shape-indexed features. The features are extracted with ref¬ 
erence to two shape points. 0 has proven that RCPR is 
more robust to large pose variations than the general CPR. 

Algorithm 1 Cascaded Pose (shape) Regression 

Require: Image /, initial pose 
Ensure: Estimated pose 
1: for t=l to T do 

2: = t> Shape-indexed features 

3 : AS' = R^{f^) > Apply regressor R^ 

4 : S^ = -f- AS > update shape 

5 : end for 


3.2. Triplet-Interpolated Feature (TIF) 

The above CPR scheme and its variants are very popu¬ 
lar given its high computational efficiency and localisation 









Figure 2: Pixels indexed by the same local coordinates 
should have the same semantic meaning. The triplet- 
interpolated feature shows its feature invariance to large 
pose variation in the right bottom figure. 


accuracy. In each iteration, random ferns or random forests 
takes raw pixel values as input features, which in turn be¬ 
come essential to fast convergence in the cascaded learning. 
Prevalent pixel-indexing features intend to be invariant with 
respect to pose variation. That is to say, the indexed pix¬ 
els referencing to same shape points are expected to have 
same semantic meaning across different samples. Such ef¬ 
forts have been made in |[9|, which applied shape-indexed 
features, and in (Sl, which achieved stronger geometric in¬ 
variance with the interpolated shape-indexed features. 


However, the interpolated shape-indexed features in 
RCPR has a fundamental drawback. It can only draw fea¬ 
tures that are lying on the line segment between two land¬ 
marks. As example shows in Fig features can be ex¬ 
tracted from a rich area of the face when the landmarks are 
dense. However it becomes problematic when the facial 
landmarks are sparse. Features can only be extracted from 
very restricted locations (see Fig. %). This limits the ran¬ 
domness of feature extraction. 


To combine the benefits of geometric invariance and 
avoid its limitations, we propose a new indexing approach, 
namely Triplet-interpolated feature(TIF), as shown in 
The indexing process works in the following way: Out of 
every group of three randomly selected landmarks, one is 
randomly chosen and assigned as the primary point. Then 
two vectors, from the primary to the rest two, can span the 
whole plane by linear combination. By setting the param¬ 
eters of the linear combination, a position can be selected 
within the spanned area, as shown in Fig The location 




Figure 3: The red lines in (a)(b) show the available area 
for feature extraction when we use the linear-interpolated 
shape-index features, (c) and (d) illustrate the concept of 
our Triplet-interpolated features and its available feature 
region. (b)(d) together show that how the new indexing 
method extends the available area for feature extraction 
when the shape annotation is sparse. 


of the point p indexed by TIF is represented as: 

P(5', i, j, k,a,l3) = yi + {a- % + /3 ■ %) (3) 

where S is the current shape and i, j, k are landmark in¬ 
dexes. Vij = Yj — Yi is the vector from the position of 
i to the position of j. a and /3 are the random ratios that 
control the position of the indexed point. Compared to the 
original closest landmark indexed feature in m, the TIF 
has two main advantages: 1) it is computationally cheaper 
since it does not have the shape transformation step; 2) it 
is more robust to large head pose variation given the triplet 
interpolation property, as shown in Fig. Compared to 
the two-point-interpolated-feature in RCPR 13, it is able to 
extract features from a much wider range, especially when 
the landmarks are sparse. We will show the benefits of us¬ 
ing TIF in the experiment section. Apart from the feature 
extraction process, we follow the cascaded pose regression 
process used by ESR ||9l and RCPR O. Note that in this pa¬ 
per, we only use the feature extraction part of RCPR as the 
occlusion estimation part requires landmark-wise occlusion 
annotation. In this way we also make the benefits of feature 
extraction clearer. 







3.3. Negatively Correlated Augmentation (NCA) 

Before introducing our data augmentation scheme, we 
first analyse the data distribution of the benchmark database 
for human facial landmark localisation, i.e., 300w, which is 
a benchmark database for human facial landmark localisa¬ 
tion. It consists of face images from AFW 1461 , HELEN 
Ha, LFPW El and the newly annotated iBug ED. We par¬ 
tition it to 3148 training images and 689 test images. Train¬ 
ing images are from AFW (337 images), HELEN training 
set (2000 images) and LEPW training set (811 images), and 
test images are from HELEN test set (330 images), LEPW 
test set (224 images) and iBug (135 images). 

Because it is impractical to analyse the data distribution 
directly on the output space given its high dimensionality, 
we ignore individual face difference and small facial de¬ 
formation. Then facial landmarks distribution is mainly af¬ 
fected by head pose variations, which lie in low dimensional 
manifold. Therefore, we analyse the distribution of head 
poses. Since head pose is not provided by the database, es¬ 
timated head pose information for each face is derived from 
the annotated facial landmarks. To this end, we fit a mean 
3D model (68 facial points) of a head to the annotated points 
in the image. Then we feed the set of corresponding 3D and 
2D points to the POSIT QD algorithm which produces the 
head pose information. 

As shown in Pig.|^ the majority of training samples dis¬ 
tribute near frontal angles. More than 97% of the samples 
lie within roll angle range between -20° and 20°. Eor pitch 
and yaw angle, such percentages are 83% and 76% respec¬ 
tively. Eor each training sample, we calculate the most sig¬ 
nificant rotation angle, i.e., the angle with the biggest abso¬ 
lute value. Then we fit a Gaussian curve on all the training 
samples as shown in Eig. 

We ran several models on the test images including 
the Explicit Shape Regression (ESR) (3, the Robust Cas¬ 
caded Pose Regression (RCPR) El, the Supervised De¬ 
scent Method (SDM) |[38l . and the TCDCN 1431 . Then we 
recorded their failures, i.e. a sample with mean localisa¬ 
tion error bigger than 0.1 inter-ocular-distance (lOD). The 
overall distribution is shown in Pig.[^ Despite these meth¬ 
ods being modelled in very different ways, their failures are 
quite similar. Only a few failures are within angle range be¬ 
tween -20° and 20°, where the majority of training samples 
distribute. To this end, we can conclude that the imbalanced 
distribution of training data has heavy impact on testing per¬ 
formance, regardless of the algorithm design. 

In the framework of cascaded shape regression, data aug¬ 
mentation is usually carried out during training time. More 
specifically, for one face image sample, several initialisa¬ 
tions are generated by Monte Carlo method. This procedure 
has been used in ESR, RCRP and SDM and the augmenta¬ 
tion number is usually fixed. We propose a simple augmen¬ 
tation scheme, under which the amount of augmentation of 



(C) (d) 
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Eigure 4: (a) (b) (c) show the histogram of the head pose 
pitch, roll and yaw angles respectively in 300w training set. 
(d) shows the fitted Gaussian curve and the histogram of 
the most significant angle of the training samples, (e) is the 
histogram of failures from several state of the art models 
trained on 300W (ESR, TREES, RCPR and SDM). 

each training sample is negatively correlated to the value 
on the fitted Gaussian curve (Pig.|^. Conceptually, this is 
similar to over-sampling in classification problem but each 
augmented sample becomes unique in our case because of 
the initialisation difference. More specifically, the augmen¬ 
tation number rux of training sample x is calculated as: 

mx= a- A/'(xpose) + b (4) 

where Xpose the head pose of x, A/’(-) the fitted Gaussian 
distribution, a is a negative variable that controls the slope 
and 6 is a bias term that controls the bounds of augmentation 
numbers. We use two pairs of values (the maximum and the 
minimum) to fit this linear equation with a constrain that 
the total number after augmentation is equal to the baseline 
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Figure 5: Performance comparison on our sheep face Figure 7: Performance comparison on dense (left) and 
dataset. (Lower is better.) sparse (right) facial landmarks. (Higher is better.) 


augmentation scheme. 

4. Evaluation 

4.1. Sheep face experiment 

We collected 600 sheep face photos from an animal re¬ 
search centre. We manually labelled the bounding boxes 
and 8 landmarks on faces as shown in Fig[^ We trained a 
structured S VM sheep face detector based on HOG features 
using dlib (231. Using a few hundred sheep face images is 
sufficient to train a sheep face detector which can be used 
in real videos. In our sheep facial landmarks localisation, 
as usual, we assumed the face bounding boxes are avail¬ 
able. We randomly split the 600 sheep faces into a training 
set (500) and a testing set (100). Then we trained our TIF 
model, ESR and RCPR using the same training set. We set 
the augmentation number to 20 for all these methods. We 
repeated this random process for 5 times, and recorded all 
the results. Since our test set is not big, we directly report 
the sorted sample-wise mean error (normalised by sheep 
face size) of the 100 images. For each index, the value is 
the average over 5 runs. As can be seen, on a small dataset 
with sparse landmarks, our method outperforms the base¬ 
line methods by a large margin. Around 90% of the sheep 
images are localised with mean error less than 10% of the 
face size. Some example images with comparison to other 
methods are shown in Fig. The sheep face image in our 
collected dataset exhibits a wide range of diversity: sheep 
breed, facial colour, lighting condition, background, occlu¬ 
sion, head pose, ear posture, etc. 

4.2. Human face experiment 

In order to further evaluate the proposed schemes, we 
carry out experiments on human face alignment benchmark 
database, i.e., 300w. Recall that we split the publicly avail¬ 
able database into training set (3148 images) and testing set 
(689 images). We have implemented and trained the base¬ 
line models (ESR and RCPR) on the same training images. 












Eigure 8: Successful-localisation-rate comparison for meth¬ 
ods with and w/o our proposed Negatively Correlated Aug¬ 
mentation (NCA). 


Note that when implementing the RCPR algorithm we only 
used their method of feature indexing (interpolation by two 
landmarks) but not their occlusion modelling since there is 
no occlusion annotation for training. Thus for ESR, RCPR 
and our TIP method, the only difference is their feature ex¬ 
traction step. During testing time, we also initialised them 
with the same random shapes for a fair comparison. We 
carried out two groups of experiments. In the first group, 
the model was trained on 68 facial landmarks, and in the 
second group, we only used very sparse landmarks, to simu¬ 
late the case of the sheep facial landmarks localisation. The 
mark-up of sparse landmarks is shown which distribute 
almost uni-formally among the original 68 landmarks on 
the face. Note that we use the face bounding box detected 
by dlib face detector ( 23 , followed by manual check for 
each face image. This is more realistic in practice than us¬ 
ing the tight bounding boxes calculated from the annotated 
facial landmarks. In order to make a fair comparison, we 
trained our model as well as most competitive models (high¬ 
lighted in Section including the RCPR, SDM, TREES, 
ESR, CCNP, EBP, with the same setting. More specifically, 
we use the same training set and the same bounding box 
definition. Por TCDCN, GNDPM, DRMP we use their ini- 






































Figure 6: Landmarks localisation on example sheep face images. From top to bottom show the result of our TIF method, 
RCPR and ESR respectively. The final column shows a failure example of our method. 


tial trained models as their performance is less competitive. 

As shown in Fig. [7^ 1) Our method (NCA + TIF) 
gets the best performance despite the improvement over the 
baseline RCRP method is not huge; 2) Only using TIF does 
not show superior performance over RCPR on dense land¬ 
marks setting, which is as expected. The benefit of us¬ 
ing TIF is more clear on sparse landmarks, as shown in 
Fig. Our proposed TIF improves the baseline RCPR 
method as well as the similar ESR method by a large mar¬ 
gin. Note that, there are some tricks that are able to make 
the cascaded pose regression methods more robust such as 
the smart-restart in jSl and the mirrorability based restart in 
1421 . which are naturally compatible to our TIF method as 
well. In this evaluation we are more interested in the bene¬ 
fits brought by the TIF. 

We evaluate the NCA scheme in three methods, our pro¬ 
posed TIF, the RCPR and ESR, since they use the same way 
of data augmentation. We set the smallest augmentation 
number to 11 and the biggest to 40 for the training samples 
in our NCA method, which makes the total number equal to 
20N, where N is the number of training samples, 20 is the 
augmentation number used by the baseline methods. In this 
evaluation, we are more concerned with test samples with 
big head pose variations. Therefore, we record the success¬ 
ful localisation rates (SLR), i.e. the percentage of test sam¬ 
ples are with mean localisation error smaller than OAIOD. 
As shown in Fig[^ the proposed NCA scheme is able to im¬ 
prove the SLR effectively. Among the 689 test samples, it is 
able to successfully localise more than around 30 samples. 
This is very significant given the fact that the failures from 
methods without NCA are already very difficult. 


5. Conclusion and discussion 

In this paper, we have addressed the problems of local¬ 
ising key landmarks on sheep and human faces. We pro¬ 
posed a new feature extraction scheme by triplet interpo- 
lation(TIF), which is more effective under the conditions 
of large head pose variation and landmark sparsity. On 
our new sheep face dataset of only 600 images, our pro¬ 
posed method works considerably well on a large diversity 
of sheep faces. We also studied the issue of training data 
imbalance and proposed an sample augmentation strategy 
to improve the performance on test samples that have big 
variations. 

Though we have pushed forward the state of the art 
method for facial landmarks localisation and decreased the 
failures, there are still failures that are mainly caused by 
head pose variation or heavy occlusions. It is an open ques¬ 
tion whether we need to address these challenges explicitly 
or provide more data similar to the failure cases. Regarding 
the sheep face analysis, we have only localised the land¬ 
marks of interest, there are still many problems to tackle in 
order to build an automatic computer vision system to iden¬ 
tify the sheep in pain. We believe these are all interesting 
and valuable problems for both computer vision and animal 
welfare community. 
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